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Usually, the study of city population distribution has been reduced to power laws. In such analysis, 
a common practice is to consider cities with more than one hundred thousand inhabitants. Here, 
we argue that the distribution of cities for all ranges of populations can be well described by using 
a (/-exponential distribution. This function, which reproduces the Zipf-Mandelbrot law, is related 
to the generalized nonextensive statistical mechanics and satisfies an anomalous decay equation. 

PACS number(s): 89.90.+n, 89.65.-s, 05.20.-y 

In several areas in nature, besides the complexities, it 
is possible to identify macroscopic regularities that can 
be well described by simple laws. For example, frequency 
of words in a long text Q| , forest fires [g] , distribution of 
species lifetimes for North American breeding bird pop- 
ulations [O , scientific citations ^Jg], www surfing 
ecology JijTsolar flares ||, football goal distribution 
economic index [ [l0| |, epidemics in isolated populations 
Jill , among others. 

In particular, recently, the interest in the study of 
city population distribution has been increased. Such 
interest is related to the analysis of data and to models 
that presents the asymptotic power law behavior p2|-^6[ . 
However, in such analysis, only cities with more than 
one hundred thousand inhabitants have been considered. 
This power law behavior can be identified in terms of the 
distribution 



10° 



10 J 



■ Europe 

• India 
USA 
▼ Brazil 




(a) 



N(x)dx oc a; a dx 



(1) 



that gives the number of cities with x and x + dx in- 
habitants, where a is a positive constant. Another way 
to express the same relation is in terms of the relative 
number (rank or cumulative distribution) of cities with 
a population larger than a certain value x, 



r(x) 



N(y)dy oc x 1 



(2) 



By expressing the population x{n) of the cities in de- 
scending order (x(l) being the city with the highest pop- 
ulation, x(2) the city with the second highest population, 
and so on), it follows from (B) that 
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The plot of x(n) on a double logarithmic scale is called 
a "Zipf plot" Q| and leads to a straight line with slop 
1/(1 - a). Note that the Zipf plot (from Eq. (§)) and 
cumulative plot (from Eq. (|J)) are equivalent, except 
when regarding the weight related to the rare (largest) 
elements. 



FIG. 1. (a) Zipf-plot for cities with population bigger than 
one hundred thousand and, in inset plot, the cumulative Zipf 
plot to the same cities in Europe, (b) Zipf-plot for all cities in 
USA and Brazil. In the above graphics, x is the population 
of the cities, n is the descending rank and r is the cumulative 
rank. 

The Zipf plot for cities with more than one hundred 
thousand inhabitants [[17[ for some countries and Europe 
is illustrated in Fig. (Jlpa). These graphics enable us to 
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visualize how good the power law is at describing the city 
population distribution for large cities. In inset plot of 
Fig. (Q-a) we show the cumulative plot for the same cities 
in Europe. However, there is a little fraction of cities 
with more than a hundred thousand inhabitants. For 
instance, these cities represent about 15% of American 
cities and 4% of Brazilian cities. Furthermore, if we take 
into account all cities [fL8 19 in the country, and by using 
the Zipf plot, Fig. (jlpb), we can identify a notorious 
deviation from the asymptotic power law when cities with 
small populations are considered. Thus, an analysis that 
considers all cities is an important task. In this direction, 
this work is dedicated to an empirical analysis of this 
question. 

An alternative approach to incorporate the deviation 
from power-law is employed in Ref. pc| by considering 
the stretched distribution (Weibull distribution), N(x) = 
Nqx -' 1 exp(— Xx c ), to fit data of some complex systems. 
In particular, for city formation, they also show an ad- 
justment to cities with population bigger than a hundred 
thousand inhabitants, by using a kind of Zipf plot for x° 
versus ln(n), where c is an adjustable parameter. How- 
ever, the Weibull distribution leads to a poor adjustment 
for the complete set of data, i.e., this distribution give 
us a satisfactory adjustment only for a restrict range of 
data. Furthermore, it is clear that the stretched function 
does not lead to an asymptotic straight line in a log-log 
plot, i. e., a power law. 

On the other hand, Zipf-Mandelbrot law Q, N(x) = 
bj (c+x) a (b, c, and a all being positive constants), gives a 
curvature in a log- log plot, presents an asymptotic power 
law behavior and is normalizable for a > 1. In this way, 
the Zipf-Mandelbrot distribution is a natural generaliza- 
tion of an inverse power law. This distribution has been 
applied in many contexts; in particular, it was recently 
employed in the discussion of scientific citations [^) and 
football goal distribution |J. Another important aspect 
of the Zipf-Mandelbrot 's distribution is that it arises nat- 
urally in the context of a generalized statistical mechanics 
proposed some years ago ]22]-|25| . In this framework, the 
above distribution is usually rewritten as a q-exponential 
function, 

N(x) = N exp q ,(-ax) = N [l - (1 - q')ax] 1/{ - 1 - q ^ , 

(4) 

where Nq — bc~ a , a — a/c, and q' = 1 + 1/a are positive 
parameters. Moreover, the above distribution has been 
largely used with q' < 1 in other contexts p6|. In this 
case, Eq. (Q) is defined equal to zero when 1 — (1 — q')ax < 
in order to overcome imaginary values for N(x). Thus, 
the distribution (^) is equivalent to Zipf-Mandelbrot law 
only for q' > 1 and gives an extension for such law when 
q' < 1 is employed. Note also that exp ? , (— x) reduces 
to the usual exponential function, exp(— x), in the limit 
q' — > 1. In addition, Eq. (^) satisfies an anomalous decay 
equation, 
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independently of the q' value. Since this equation reduces 
to the usual decay one in the limit q' — ► 1, the parame- 
ter q' can be interpreted as a measure of how anomalous 
the decay is. These aspects put the Zipf-Mandelbrot law 
in a broad context, motivating us to employ the gener- 
alized Tsallis exponential, Eq. (||), instead of the Zipf- 
Mandelbrot form to study the city population distribu- 
tion. 

The cumulative distribution, for !<</'< 1.5, is 



r(x) = r 



-ax 



1/(1-9) 



(6) 



where ro = Noq/a, and q = (2 — q') . Usually, to 
compare this cumulative distribution with that obtained 
from data, it is employed a log-log plot. Here, we in- 
troduce another possible way to analyze data by using a 
generalized mono-log plot based on the generalized loga- 
rithm function, ln 9 (x) = (a; 1 " 9 — 1)/(1 — q). This general- 
ized function arises naturally in the framework of Tsallis 
statistics p2] , p3| , p5| and reduces to the usual logarithm, 
ln(a;), for q — ► 1. It is easy to verify that the plot of 
ln g [r(x)] versus x leads to a straight line. So, if the data 
are well described by the distribution (|J), we can obtain 
the q- value that gives the best linear fit in the generalized 
mono-log plot, independently of other parameters. 
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FIG. 2. Fit of cumulative distribution for all cities in USA. 
The parameters are q = 1.7, r = 2919.4 and a = 0.00008. 
The coefficient of determination in non- linear fit is R 2 = 0.99. 
Inset plot: generalized mono- log plot for American cities. 

Here we used this generalized mono-log plot analysis 
and we found that q « 1.7 gives a good adjustment to 
all American and Brazilian cities. Inset plots of Fig. (|^) 
and (^) show this adjust for American and Brazilian cities 
respectively. Note that in Fig. (||) the two biggest cities 
are above the straight line formed by all other cities. This 
fact is known as "king" effect |2(],|27| , and occurs because 
a few cities in some of the countries, by a specific cause 
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(economic, political, etc), play an irregular competition 
to attract people and do not follow the same rule that 
most of the cities do. This cities that dominate a region 
or country, which is highly centralized, is also referred 
as "primate cities" effect p8|. Of course, this effect can 
also be observed if you restrict to cities with more than 
one hundred thousand inhabitants. For example, if we 
consider countries as England and France, the king effect 
is related to London and Paris |2(J . 

By fixing q = 1.7, we obtain the other parameters from 
a non-linear fit for the cumulative distribution. This fit 
is shown in Fig. (||) for American cities and in Fig. (||) 
for Brazilian ones. 

In order to analyze the agreement between data and 
the obtained distribution, beyond what has been visu- 
alized in Figs. (|J) and (||), we calculate the total pop- 
ulation p = f xN(x)dx and the average population 

by cities by < T>= xN(x)dx/ N(x)dx @. 

Comparing p and < x > with experimental value we ob- 



tain the deviation Ap 



Pdata-Pmodel 



Pdata 



100% = 3.9% for 



USA cities. Now, considering cities with less than one 
hundred thousand inhabitants, we have Ap < = 4.6%, 
that is better than the one obtained in reference p(}] using 
the stretched exponential distribution. For the USA aver- 
age population we obtain A < x >= 6.3%. In the Brazil- 
ian case, we obtain Ap = 7.0% and A < x >— 9.0%. It 
is interesting to remark that the deviations A < x > and 
Ap could be smaller if the "king" effect is not present. 
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FIG. 3. Fit of cumulative distribution for all cities in 
Brazil. The parameters are q = 1.7, rn — 6968.6 and 
a — 0.00024. The coefficient of determination in non-linear 
fit is R =0.99. Inset plot: generalized mono- log plot for 
Brazilian cities. 

In this brief report we show that the population of a 
country (USA and Brazil), distributed in its cities, is well 
described by a g-exponential with q — 1.7. Thus, this 
fact indicates a possible connection among the previous 
results, Tsallis statistics and anomalous decay. Further- 
more, when one deals with a distribution that can be ad- 
justed by a ^-exponential, the generalized mono-log plot 



introduced here gives a practical way to determine the q 
value, independently of other parameters of the distribu- 
tion. 
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